AITopics | discrimination risk

Bias and Volatility: A Statistical Framework for Evaluating Large Language Model's Stereotypes and the Associated Generation Inconsistency

Neural Information Processing SystemsMar-22-2026, 11:05:58 GMT

We present a novel statistical framework for analyzing stereotypes in large language models (LLMs) by systematically estimating the bias and variation in their generation. Current evaluation metrics in the alignment literature often overlook the randomness of stereotypes caused by the inconsistent generative behavior of LLMs. For example, this inconsistency can result in LLMs displaying contradictory stereotypes, including those related to gender or race, for identical professions across varied contexts. Neglecting such inconsistency could lead to misleading conclusions in alignment evaluations and hinder the accurate assessment of the risk of LLM applications perpetuating or amplifying social stereotypes and unfairness.This work proposes a Bias-Volatility Framework (BVF) that estimates the probability distribution function of LLM stereotypes. Specifically, since the stereotype distribution fully captures an LLM's generation variation, BVF enables the assessment of both the likelihood and extent to which its outputs are against vulnerable groups, thereby allowing for the quantification of the LLM's aggregated discrimination risk. Furthermore, we introduce a mathematical framework to decompose an LLM's aggregated discrimination risk into two components: bias risk and volatility risk, originating from the mean and variation of LLM's stereotype distribution, respectively. We apply BVF to assess 12 commonly adopted LLMs and compare their risk levels. Our findings reveal that: i) Bias risk is the primary cause of discrimination risk in LLMs; ii) Most LLMs exhibit significant pro-male stereotypes for nearly all careers; iii) Alignment with reinforcement learning from human feedback lowers discrimination by reducing bias, but increases volatility; iv) Discrimination risk in LLMs correlates with key sociol-economic factors like professional salaries. Finally, we emphasize that BVF can also be used to assess other dimensions of generation inconsistency's impact on LLM behavior beyond stereotypes, such as knowledge mastery.

artificial intelligence, large language model, natural language, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

BiasandVolatility: AStatisticalFrameworkfor EvaluatingLargeLanguageModel'sStereotypesand theAssociatedGenerationInconsistency

Neural Information Processing SystemsFeb-18-2026, 02:20:00 GMT

The biases ofSnow are manifested in the statistical properties of itsperspectives(i.e.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.68)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

d4c2f25bf0c33065b7d4fb9be2a9add1-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 03:36:11 GMT

We call this distinction thediscrimination risk. We prove that a higher discrimination risk can amplify the unfairness of a machine learning model applied totheimputed data.

artificial intelligence, discrimination risk, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.15)

Genre: Research Report (0.68)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs

Neural Information Processing SystemsDec-25-2025, 09:51:06 GMT

In human networks, nodes belonging to a marginalized group often have a disproportionate rate of unknown or missing features. This, in conjunction with graph structure and known feature biases, can cause graph feature imputation algorithms to predict values for unknown features that make the marginalized group's feature values more distinct from the the dominant group's feature values than they are in reality. We call this distinction the discrimination risk. We prove that a higher discrimination risk can amplify the unfairness of a machine learning model applied to the imputed data. We then formalize a general graph feature imputation framework called mean aggregation imputation and theoretically and empirically characterize graphs in which applying this framework can yield feature values with a high discrimination risk. We propose a simple algorithm to ensure mean aggregation-imputed features provably have a low discrimination risk, while minimally sacrificing reconstruction error (with respect to the imputation objective). We evaluate the fairness and accuracy of our solution on synthetic and real-world credit networks.

discrimination risk, mean aggregation feature imputation, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

Bias and Volatility: A Statistical Framework for Evaluating Large Language Model's Stereotypes and the Associated Generation Inconsistency

Neural Information Processing SystemsOct-10-2025, 16:15:10 GMT

This work proposes a Bias-V olatility Framework (BVF) that estimates the probability distribution function of LLM stereotypes.

discrimination risk, llm, stereotype, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Law > Civil Rights & Constitutional Law (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs

Neural Information Processing SystemsAug-19-2025, 05:53:07 GMT

In human networks, nodes belonging to a marginalized group often have a disproportionate rate of unknown or missing features.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (0.93)
Health & Medicine (0.67)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Bias and Volatility: A Statistical Framework for Evaluating Large Language Model's Stereotypes and the Associated Generation Inconsistency

Neural Information Processing SystemsMay-27-2025, 16:02:33 GMT

We present a novel statistical framework for analyzing stereotypes in large language models (LLMs) by systematically estimating the bias and variation in their generation. Current evaluation metrics in the alignment literature often overlook the randomness of stereotypes caused by the inconsistent generative behavior of LLMs. For example, this inconsistency can result in LLMs displaying contradictory stereotypes, including those related to gender or race, for identical professions across varied contexts. Neglecting such inconsistency could lead to misleading conclusions in alignment evaluations and hinder the accurate assessment of the risk of LLM applications perpetuating or amplifying social stereotypes and unfairness.This work proposes a Bias-Volatility Framework (BVF) that estimates the probability distribution function of LLM stereotypes. Specifically, since the stereotype distribution fully captures an LLM's generation variation, BVF enables the assessment of both the likelihood and extent to which its outputs are against vulnerable groups, thereby allowing for the quantification of the LLM's aggregated discrimination risk. Furthermore, we introduce a mathematical framework to decompose an LLM's aggregated discrimination risk into two components: bias risk and volatility risk, originating from the mean and variation of LLM's stereotype distribution, respectively.

generation inconsistency, llm, stereotype, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs

Neural Information Processing SystemsJan-18-2025, 23:57:49 GMT

In human networks, nodes belonging to a marginalized group often have a disproportionate rate of unknown or missing features. This, in conjunction with graph structure and known feature biases, can cause graph feature imputation algorithms to predict values for unknown features that make the marginalized group's feature values more distinct from the the dominant group's feature values than they are in reality. We call this distinction the discrimination risk. We prove that a higher discrimination risk can amplify the unfairness of a machine learning model applied to the imputed data. We then formalize a general graph feature imputation framework called mean aggregation imputation and theoretically and empirically characterize graphs in which applying this framework can yield feature values with a high discrimination risk.

discrimination risk, feature value, mean aggregation feature imputation, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)

Add feedback

Prejudice and Volatility: A Statistical Framework for Measuring Social Discrimination in Large Language Models

Liu, Y, Yang, K, Qi, Z, Liu, X, Yu, Y, Zhai, C

arXiv.org Artificial IntelligenceMay-24-2024

This study investigates why and how inconsistency in the generation of Large Language Models (LLMs) might induce or exacerbate societal injustice. For instance, LLMs frequently exhibit contrasting gender stereotypes regarding the same career depending on varied contexts, highlighting the arguably harmful unpredictability of LLMs' behavioral patterns. To augment the existing discrimination assessment with the capability to account for variation in LLM generation, we formulate the Prejudice-Volatility Framework (PVF) that precisely defines behavioral metrics for assessing LLMs, which delineate the probability distribution of LLMs' stereotypes from the perspective of token prediction probability. Specifically, we employ a data-mining approach to approximate the possible applied contexts of LLMs and devise statistical metrics to evaluate the corresponding contextualized societal discrimination risk. Further, we mathematically dissect the aggregated discrimination risk of LLMs into prejudice risk, originating from their system bias, and volatility risk, stemming from their generation inconsistency. While initially intended for assessing discrimination in LLMs, our proposed PVF facilitates the comprehensive and flexible measurement of any inductive biases, including knowledge alongside prejudice, across various modality models. We apply PVF to 12 most commonly adopted LLMs and compare their risk levels. Our findings reveal that: i) prejudice risk is the primary cause of discrimination risk in LLMs, indicating that inherent biases in these models lead to stereotypical outputs; ii) most LLMs exhibit significant pro-male stereotypes across nearly all careers; iii) alignment with Reinforcement Learning from Human Feedback lowers discrimination by reducing prejudice, but increases volatility; iv) discrimination risk in LLMs correlates with socio-economic factors like profession salaries.

discrimination risk, llm, stereotype, (13 more...)

arXiv.org Artificial Intelligence

2402.15481

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values

Jeong, Haewon, Wang, Hao, Calmon, Flavio P.

arXiv.org Machine LearningSep-21-2021

We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fair learning algorithms to an imputed dataset may lead to an unfair model. In this paper, we first theoretically analyze different sources of discrimination risks when training with an imputed dataset. Then, we propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. Instead, we train a tree with missing incorporated as attribute (MIA), which does not require explicit imputation, and we optimize a fairness-regularized objective function. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset, through several experiments on real-world datasets.

dataset, imputation, imputation method, (11 more...)

arXiv.org Machine Learning

2109.10431

Country: North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Filters

Collaborating Authors

discrimination risk

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Bias and Volatility: A Statistical Framework for Evaluating Large Language Model's Stereotypes and the Associated Generation Inconsistency

BiasandVolatility: AStatisticalFrameworkfor EvaluatingLargeLanguageModel'sStereotypesand theAssociatedGenerationInconsistency

d4c2f25bf0c33065b7d4fb9be2a9add1-Paper-Conference.pdf

On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs

Bias and Volatility: A Statistical Framework for Evaluating Large Language Model's Stereotypes and the Associated Generation Inconsistency

On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs

Bias and Volatility: A Statistical Framework for Evaluating Large Language Model's Stereotypes and the Associated Generation Inconsistency

On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs

Prejudice and Volatility: A Statistical Framework for Measuring Social Discrimination in Large Language Models

Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values